At useR! 2017 I saw a poster that blew my mind. It was Alex Kruse’s map of Hamburg that showed common routes bike share users would take. I’ve always wanted to reproduce it so here I am doing my best to make something similar with DC’s Capital Bike Share open data.
Most of the techniques I use here were learned by reading Geocomputation with R, exploring Stack Overflow, and by reading different vignettes from the R community.
library(tidyverse)
library(tmaptools)
library(tmap)
library(osmdata)
library(stplanr)
library(igraph)
library(sf)
Data can be found here.
dc <- st_read("Neighborhood_Clusters/Neighborhood_Clusters.shp")
## Reading layer `Neighborhood_Clusters' from data source `/Users/ben/Neighborhood_Clusters/Neighborhood_Clusters.shp' using driver `ESRI Shapefile'
## Simple feature collection with 46 features and 7 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -77.1198 ymin: 38.79164 xmax: -76.90915 ymax: 38.99597
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
summary(dc)
## OBJECTID WEB_URL NAME
## Min. : 1.00 http://planning.dc.gov/:39 Cluster 1 : 1
## 1st Qu.:12.25 NA's : 7 Cluster 10: 1
## Median :23.50 Cluster 11: 1
## Mean :23.50 Cluster 12: 1
## 3rd Qu.:34.75 Cluster 13: 1
## Max. :46.00 Cluster 14: 1
## (Other) :40
## NBH_NAMES Shape_Leng
## Arboretum, Anacostia River : 1 Min. : 1990
## Brightwood Park, Crestwood, Petworth : 1 1st Qu.: 6920
## Brookland, Brentwood, Langdon : 1 Median : 8883
## Capitol Hill, Lincoln Park : 1 Mean :10457
## Capitol View, Marshall Heights, Benning Heights: 1 3rd Qu.:10654
## Cathedral Heights, McLean Gardens, Glover Park : 1 Max. :57862
## (Other) :40
## Shape_Area TYPE geometry
## Min. : 312062 Additional: 7 POLYGON :46
## 1st Qu.: 1943120 Original :39 epsg:4326 : 0
## Median : 3433941 +proj=long...: 0
## Mean : 3857591
## 3rd Qu.: 4497491
## Max. :21175571
##
tm_shape(dc) +
tm_borders('hotpink') +
tm_layout(title = 'DC Neighborhood Clusters',
title.color = 'white',
bg.color = 'grey5')
#create a bbox of dc
dc_bbox <- st_bbox(dc)
#create a simple feature collection of major roads in dc
dc_roads <- c('primary', 'secondary', 'tertiary', 'cycleway') %>%
map(function(x){
opq(bbox = dc_bbox) %>%
add_osm_feature(key = 'highway', value = x) %>%
osmdata_sf() %>%
.$osm_lines %>%
select(geometry, highway)
}) %>%
do.call(rbind, .)
tm_shape(dc) +
tm_borders('hotpink4', 1) +
tm_fill('hotpink3', .05) +
tm_shape(filter(dc_roads, highway == 'primary')) +
tm_lines('grey100', 1) +
tm_shape(filter(dc_roads, highway == 'secondary')) +
tm_lines('grey75', .5) +
tm_shape(filter(dc_roads, highway == 'tertiary')) +
tm_lines('grey50', .5) +
tm_shape(filter(dc_roads, highway == 'cycleway')) +
tm_lines('grey25', .5) +
tm_layout(title = 'DC Roads',
title.color = 'white',
bg.color = 'grey5')
So the previous image contained path information. We can combine the path information with the original Bike Share data to see which routes most bikers were riding on.
dc_routes_wt <- dc_routes %>%
mutate(start = dc_desire$start,
end = dc_desire$end) %>%
left_join(trips_2017_clean)
## Joining, by = c("start", "end")
summary(dc_routes_wt)
## distance duration error id
## Min. : 77.7 Min. : 16.5 Length:119526 Length:119526
## 1st Qu.: 1659.0 1st Qu.: 326.4 Class :character Class :character
## Median : 2568.7 Median : 470.2 Mode :character Mode :character
## Mean : 3020.5 Mean : 487.4
## 3rd Qu.: 3805.0 3rd Qu.: 629.2
## Max. :30448.3 Max. :1626.0
## NA's :38 NA's :38
## start end month nn
## Length:119526 Length:119526 Oct :10532 Min. : 1.00
## Class :character Class :character Sep :10518 1st Qu.: 8.00
## Mode :character Mode :character Aug :10453 Median : 16.00
## Nov :10333 Mean : 25.84
## Jul :10324 3rd Qu.: 29.00
## Jun :10256 Max. :1626.00
## (Other):57110
## geometry
## MULTILINESTRING:119526
## epsg:4326 : 0
## +proj=long... : 0
##
##
##
##
For plotting and data cleaning purposes we will only keep routes with more than 200 riders.
dc_routes_wt <- dc_routes_wt %>%
filter(nn > 200)
There is a big problem however, many routes pass over other routes. If we plot them as they are, then there will be overlapping lines and the information stored on the covered lines will be hidden. What we can do to solve this problem is aggregate the information on the lines. This can be done by identifying overlapping lines and aggregatting the underlying values on those lines.
The below aggregation method was inspired by a stack overflow respondent named ibusett. The original answer can be found here.
dc_routes_intersect <- dc_routes_wt$month %>%
levels() %>%
map(function(x){
filter(dc_routes_wt,
month == x) %>%
st_intersection()
}) %>%
do.call(rbind, .)
dc_routes_month <- dc_routes_wt$month %>%
levels() %>%
{
month_list <- map(., function(x){
filter(dc_routes_wt, month == x)
})
names(month_list) <- .
month_list
}
dc_routes_agg <- dc_routes_intersect %>%
pmap_int(function(origins, month, ...){
sum(dc_routes_month[[month]][origins,]$nn)
})
dc_routes_intersect$agg <- dc_routes_agg
Since we will only plot the lines, we will only need to keep the LINESTRING geometries from the returned geometry collection.
dc_routes_linestrings <-dc_routes_intersect %>%
st_collection_extract('LINESTRING')
Because we have arranged our data by month, we can facet the images by month. We can either create a single image by faceting or we can create an animation. Below is the code for the animation.
trips_2017_anim <- tm_shape(filter(dc_roads, highway == 'primary')) +
tm_lines('grey40', .1) +
tm_shape(filter(dc_roads, highway == 'secondary')) +
tm_lines('grey30', .5) +
tm_shape(filter(dc_roads, highway == 'tertiary')) +
tm_lines('grey20', .5) +
tm_shape(dc_routes_linestrings) +
tm_lines(col = 'agg',
1,
title.col = '# Bikes',
breaks = c(0, 1000, 2000, Inf),
palette = c('#000dff', '#8086ff', "#ffffff"),
labels = c('< 1k', '< 2k', '> 2k')) +
tm_facets(along = 'month', free.coords = F, drop.units = T)+
tm_layout(bg.color = 'grey10',
legend.text.color = 'white',
title = "2017 DC Capital Bike Share Routes",
title.color = 'white')
The monthly mapped routes of DC bike share. High density riding seems to happend during the summer months (June, July, Aug).
Something I’m really excited about is the concept of the geo graph. With the spatial data we already have, we can create a graph where the nodes are the intersections and the edges are the routes.
sln <- dc_routes_linestrings %>%
as('Spatial') %>%
SpatialLinesNetwork()
E(sln@g)$agg <- dc_routes_linestrings$agg
E(sln@g)$month <- as.character(dc_routes_linestrings$month)
V(sln@g)$lon <- sln@g$x
V(sln@g)$lat <- sln@g$y
V(sln@g)$strength <- strength(sln@g, weights = E(sln@g)$agg)
V(sln@g)$btwn <- betweenness(sln@g)
sln@g %>%
{. - E(.)[month != "Jun"]} %>%
{. - V(.)[degree(.) == 0]} %>%
plot(.,
layout = matrix(c(V(.)$lon, V(.)$lat), nrow = vcount(.)),
vertex.size = map_dbl(V(.)$strength / max(V(.)$strength), function(x){
if(x < .33) return(2)
if(x < .66) return(6)
return(12)
}),
vertex.label = '',
vertex.color = map_chr(V(.)$strength / max(V(.)$strength), function(x){
if(x < .33) return('navy')
if(x < .66) return('cyan')
return('gold')
}),
main = "DC Bike Share: Intersection Betweeness June 2017")
V(sln@g) %>%
.[[strength == max(.$strength)]]
## + 1/603 vertex, from b2234fc:
## lon lat strength btwn
## 85 -77.05009 38.887 103211 1765
The intersection of Independence Avenue & Ohio Drive SW by the Lincoln and John Ericsson memorials may be worth looking into since it is one of the most crossed intersections.
Mixing graph analysis with spatial analysis will be a lot of fun and I plan on pursuing this type of joint analysis in the future.